Towards Facilitating the Accessibility of Web 2.0 Texts through Text Normalisation
نویسندگان
چکیده
The Web 2.0, through its different platforms, such as blogs, social networks, microblogs, or forums allows users to freely write content on the Internet, with the purpose to provide, share and use information. However, the non-standard features of the language used in Web 2.0 publications can make social media content less accessible than traditional texts. For this reason we propose TENOR, a multilingual lexical approach for normalising Web 2.0 texts. Given a noisy sentence either in Spanish or English, our aim is to transform it into its canonical form, so that it can be easily understood by any person or text simplification tools. Our experimental results show that TENOR is an adequate tool for this task, facilitating text simplification with current NLP tools when required and also making Web 2.0 texts more accessible to people unfamiliar with these text types.
منابع مشابه
On Evaluating the Contribution of Text Normalisation Techniques to Sentiment Analysis on Informal Web 2.0 Texts∗ Evaluación de la Contribución de la Normalización al Análisis de Sentimiento en Textos Informales de la Web 2.0
The writing style used in social media usually contains informal elements that can lower the performance of Natural Language Processing applications. For this reason, text normalisation techniques have drawn a lot of attention recently when dealing with informal content. However, not all the texts present the same level of informality and may not require additional pre-processing steps. Therefo...
متن کاملImproving Web 2.0 Opinion Mining Systems Using Text Normalisation Techniques
A basic task in opinion mining deals with determining the overall polarity orientation of a document about some topic. This has several applications such as detecting consumer opinions in on-line product reviews or increasing the effectiveness of social media marketing campaigns. However, the informal features of Web 2.0 texts can affect the performance of automated opinion mining tools. These ...
متن کاملDLSI en Tweet-Norm 2013: Normalización de Tweets en Español
The lexical richness and its ease of access to large volumes of information converts the Web 2.0 into an important resource for Natural Language Processing. Nevertheless, the frequent presence of non-normative linguistic phenomena that can make any automatic processing challenging. In this paper is described the participation in the Text Normalisation Workshop at the SEPLN conference (Tweet-nor...
متن کاملAccess Toolkit for Education
This paper describes three tools that have been developed to help overcome accessibility, usability and productivity issues identified by disabled students. The Web2Access website allows users to test any Web 2.0 site or software application against a series of checks linked to the WCAG 2.0 and other guidelines. The Access Tools accessible menu helps with navigation to portable pen drive applic...
متن کاملReading Performance of Iranian EFL Learners in Paper and Digital texts
Dependence on computers and internet has given birth to digital literacy. However, research into its influences on the reading process is still in its infancy. To fill the gap, this study was designed to investigate the ways in which text presentation mode (paper vs. digital) affects reading comprehension, as well as reading attitudes. To this end, a sample of 30 male and female English major s...
متن کامل